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Abstract 

The Lempel-Ziv universal coding scheme is asymptotically optimal 
for the class of all stationary ergodic sources. A problem of robust- 
ness of this property under small violations of ergodicity is studied. 
A notion of deficiency of algorithmic randomness is used as a mea- 
sure of disagreement between data sequence and probability measure. 
We prove that universal compressing schemes from a large class are 
non-robust in the following sense: if the randomness deficiency grows 
arbitrarily slowly on initial fragments of an infinite sequence then the 
property of asymptotic optimality of any universal compressing algo- 
rithm can be violated. Lempel-Ziv compressing algorithms are ro- 
bust on infinite sequences generated by ergodic Markov chains when 
the randomness deficiency of its initial fragments of length n grows as 
o(n). 

1 Introduction 

Well known data compression schemes universal for classes of stationary er- 
godic sources, like Lempel-Ziv algorithms, are asymptotically optimal [H [2] . 
In particular, for almost every infinite binary sequence ujiuj2 ■ ■ ■ generated by 
an ergodic source with unknown statistics the average length of codeword 
related to one bit of input sequence tends to entropy of the source when 
the block length tends to infinity. It looks significant a property of cod- 
ing algorithms to be robust under small variations of its parameters. We 
consider in this paper a problem of robustness of the asymptotic optimal- 
ity property under small violations of ergodicity of a source. A notion of 
deficiency of algorithmic randomness dp{uJi . . . uJn) is used as a measure of 
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disagreement between data sequence uj . . .ujn - ■ ■ and probability distribution 
P. This notion is considered in Kolmogorov theory of algorithmic complexity 
and randomness [H HI |5] . In the framework of this theory we can formulate 
laws of probability theory, i.e. statements which hold almost surely, in a 
"pointwise" form as statements which hold for individual objects. The set of 
Martin-Lof [B] random sequences is used at the present time as a standard 
set of such individual objects. The measure of this set is equal 1 and laws 
of probability theory, like the law of large numbers, the law of iterated loga- 
rithm and others, hold for each sequence from this set. A sequence ■ ■ ■ 
is algorithmic random with respect to a computable measure P if and only if 
the randomness deficiency dp^Ui . . . Un) of its initial fragments of length n is 
bounded then n increases (exact definition of the randomness deficiency will 
be given in Section [2]). 

"Robustness" under small violations of algorithmic randomness of some 
probability laws was studied in [3, [8] . These statements hold not only for 
random sequences but they hold also for sequences from more broader sets: 
the law of large numbers for symmetric Bernoulli scheme holds for any se- 
quence U1UJ2 . . . such that dp{ui . . . Un) = o{n); the law of iterated logarithm 
holds if dpioji . . .ojn) = o(loglogn). Small variations of these conditions 
imply violations of these laws. Robustness property can be failed for laws 
of more general type. It is proved in [9] that Birkhoff's ergodic theorem is 
non-robust in this sense - any small growing of the deficiency of randomness 
on initial fragments of an infinite sequence UJ1UJ2 ■ ■ ■ can imply the violation 
of the statement of this theorem. 

We prove that for any unbounded, nonnegative, and nondecreasing func- 
tion (y{n) a stationary ergodic (and computable with respect to a) measure 
P exists such that for any universal code for some infinite binary sequence 
uji . . .ujn ■ ■ ■ inequality dp{ijji . . . Un) < o"(n) holds for all sufficiently large n 
and the property of asymptotic optimality of this code is violated for this 
sequence. 

2 Algorithmic complexity and randomness 

Main notions and results on computability can be found in |10|. In this paper 
we consider algorithms working with constructive objects (that is integer and 
rational numbers, or words in finite alphabet). Let B be some finite alphabet 
and B* be the set of all words (finite sequences of letters) in it. Empty word 
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A is also an element of B*. Let l{x) be the length (number of letters) of a 
word X E B*. We write x C y if a word x is a prefix of a word y. Two words 
X and x' are comparable if x ^ x' or x' C x. Let 6x be a concatenation of b 
and X (i.e. all letters of x follow after all letters of b in bx). 

Kolmogorov (algorithmic) complexity of a word x E B* (with respect to 
a word y G B*) is equal to the length of the shortest binary codeword p (i.e. 
p E {0, 1}*) by which given y the word x can be reconstructed 

K^ix\y) = min{/(p) : ^pip^y) = x}. 

By this definition the complexity depends on partial computable function 
ip - method of decoding. A.N. Kolmogorov proved that an optimal decoding 
algorithm ip exists such that for any positive constant c (do not depending 
from X, y and ip') 

K^{x\y) < K^\x\y) + 2i^(^') + c (1) 

holds for any computable decoding function ip' and for all words x and y. 
Here K{ip') is the length of the shortest program computing values of ip' . 
We fix some optimal decoding function ip. The value K{x\y) = K^(x\y) 
is called (conditional) Kolmogorov complexity of x given y. Unconditional 
complexity of x is defined K{x) = K{x\A). 

It follows from [11] that a corresponding to tp coding algorithm (in sense 
of Section Hj) computing by x a codeword p of minimal length such that 
'ip{p) = X does not exist. 

We will use some properties of Kolmogorov complexity [3, [TT] . Incom- 
pressibility property asserts that for any positive integer numbers n and m 
a portion of all sequences x of length n such that 

K{x) < n - m, (2) 

is less than 2"*". Indeed, the number of all x satisfying this inequality does 
not exceed the number of all binary programs generating them. Since the 
length of any such program is less than n — m the number of these programs 
is less than 2"""^. 

^ We suppose that min0 = +oo. 

^ We suppose that some universal programming language is fixed, and all decoding 
programs are written in this language (the constant c depends on this language). 
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Let X and b be finite words. It is easy to construct a function wliicfi 
given any program computing bx and tlie length of b computes the word x. 
Therefore, □ 

K{x)<K{bx)+2\ogl{b)+c (3) 

for any x, where c is a positive constant not depending from b and x. 

We consider a probabihty space {Q, F, P), where Q = {0, 1}°°, Borel field 
F is generated by balls F^. = {cu G : x C u}, where x G {0, 1}*. To define a 
probability measure P on the space Q it is sufficient to define the concordant 
values P(r^) = P{x) such that P(A) = 1 and P{x) = P{xO) + P{xl) for 
all X, where xu denotes a word obtained from x by adding u on right. After 
that, the function P can be extended by Kolmogorov extension theorem 
|12j . A uniform Bernoulli probability distribution on binary sequences is 
defined 5i/2(x) = 2-'(^). A measure P is called computable if there exists an 
algorithm which given a finite sequence x and a degree of accuracy, a rational 
e > 0, outputs a rational approximation to P{x) with the accuracy e. 

A notion of algorithmic random sequence is defined using an algorithmic 
analogue of a set of measure 0. Let P be a computable probability measure 
on a set of all infinite binary sequences Q. 

A set M C f2 has P-measure if for each rational e > there is a sequence 
x(l), x{2), ... of elements of S such that the set Ue = Ujra.(j) satisfies M <Z Ue 
and P{Uf) < e. A P-null set is called effectively P-nuU if there exists a 
computable function x{e,i) such that M C = ur^.(£ j) and P{U^) < e 
for each rational e > 0. It can be proved that for any computable measure 
P there exists the largest with respect to the measure-theoretic inclusion 
effectively P-null set [U |5l [6]. The complement of this largest effectively 
P-null set is called the constructive support of the measure P. An infinite 
sequence G is called algorithmic random with respect to the measure P 
(random in the sense of Martin-Lof ) if it belongs to the constructive support 
of the measure P. 

Using some modification of decoding algorithms we can define a notion 
of algorithmic random sequence in terms of complexity [H O [13]. Let us 
consider monotonic computable transformations of sequences. Let A and B 
be finite alphabets, and let a set C A* x B* is (recursively) enumerable (by 
means of some algorithm) and such that for any {x,y), {x',y') ^ ip if x and 
x' are comparable then y and y' are also comparable. Let also A = {0, 1}. 

We will consider in the following logarithms on the base 2. 
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The set defines some monotonic with respect to C decoding function 

i'ip) = sup{x : {p,x) G V^}. (4) 

The class of such monotonic functions determines the corresponding algo- 
rithmic complexity 

Km^{x) = mm{l{p) : x C ip^p)}. 

The corresponding optimal complexity Km{x) is differ from complexity K{x) 
by a term of order of logarithm from l{x). We have 

K{x)-2\ogl{x)-c<Km{x) < K{x) + 2\ogK{x) + c (5) 

for all x, where c is a positive constant (U E]. 

For any sequence uj denote by cj" = cui . . . a;„ its initial fragment of length 
n. The following fundamental assertion (which at first was proved in [13]) 
holds. 

Proposition 1 Let P be some computable measure. Then 

1) for any infinite sequence uo a constant c exists such that for all n 
inequality Km{uj"') < — logP(u;") + c holds, besides, for any m 

P(U{r^ : -logP(x) - Km{x) > m}) < 2'""; 

2) a sequence uj is random with respect to a measure P in sense of Martin- 
Lof if and only if for some constant c it holds Km{ijj^) > — log P{uj^) — c for 
all n. 

These proposition shows that asymptotic behaviour of the function 

= -logPK) - ifm(cu") 

can be used as a quantitative measure of nonrandomness of the sequence 
uo. By Proposition [T] a sequence uj is algorithmic random with respect to a 
computable measure P if and only if sxvp dp{uj'^) < oo. The value dp{uj'^) is 

n 

called the deficiency of algorithmic randomness of a word (finite sequence) 
cu" with respect to a measure P [H O [H]. 

Basic notions of ergodic theory can be found in [15] (see also Appendix 2 
to this paper). A property of "asymptotic optimality of compression" by 
means of the shortest codeword defining the Kolmogorov complexity holds. 

^ Here the by supremum we mean an union of all comparable x in one sequence. 



5 



Corollary 1 Let P be an arbitrary computable stationary ergodic measure, 
and let H be its entropy. Then for P-almost all infinite sequences uo the 
following limits exist and the corresponding equalities hold 

lim ^ — '- = lim ^ — ^ = lim — ^ = H. 6 

n— >oo 12 n— ►oo 7^ n— »oo fi 

This corollary follows from Proposition [H relation ([5]) and Shannon - McMil- 
lan - Breiman theorem |15j . At first this corollary was proved for K{x) in |TT] . 
In [16] a variant of ([6]) for algorithmic random sequence was obtained: for any 
infinite sequence u random with respect to a computable ergodic measure P 
with entropy H relations ([6]) hold where the limit is replaced on upper limit. 

3 Non-robustness property of the universal 
data compression scheme 

It looks important a property of compressing algorithms to be robust under 
small variations of its parameters. The following Theorem [1] can be inter- 
preted as an assertion of that "optimal compression scheme" corresponding 
to Kolmogorov complexity is non-robust in the class of all stationary ergodic 
sources. As consequences of this theorem we obtain in Section H] results on 
non-robustness of computable universal coding schemes (see Propositions [2] 
and [3]). 

Theorem 1 For any nonnegative, nondecreasing , and unbounded function 
(j{n) and for any real number < e < 1/4 a computable with respect to a 
stationary ergodic measure P with entropy < H < e and an infinite binary 
sequence a exist such that 

cip(a") < a{n) (7) 
for almost all n. It holds also 

hmsup^ — ->-, (8) 

n^oo n 4 

liminf < e. (9) 

n— ►oo ji ^ 
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Proof. Let r > be a sufficiently small rational number. Let us consider a 
partition 

^0 = [0,^)U(i + r, l],7ri = [i i + r] 

of semiopen interval [0, 1) (the number r will be specified later). Using cut- 
ting and stacking method (basic definitions for this method will be given 
in Appendix 2) we will define an ergodic transformation T of interval [0, 1) 
which will generate a stationary ergodic measure P on the set Q. To define 
the measure P consider 

P{aia2 ...an) = X{io:uje [0, 1), T{uj) G 7ra,,z = 1, 2, . . . , n}, (10) 

where 0102 ... fln is an arbitrary binary sequence, A is the uniform measure 
on the interval [0, 1). The measure P is extended on arbitrary Borel subsets 
of n by a natural fashion [T2] . 

The ergodic transformation T will be defined by a sequence of gadgets A^, 
Us, where s = 0, 1, . . .. Let a gadget $s be the union of these two gadgets. 
We define at step s an approximation Tg = T($s) of the transformation T 
and corresponding approximation P'^ of the measure P analogously to (llUp . 
The transformation Tg determines finite trajectories starting in the points of 
internal intervals of these gadgets and finishing in the top intervals. Any such 
trajectory has a name which is a word in the alphabet {0, 1}. By definition for 
any word a (for any set of words D) the number P^{a) {P^{D) accordingly) 
is equal to the sum of lengths of all intervals of the gadget from which 
trajectories with names extending a (extending words from D) start. 

Since the function a is nondecreasing and unbounded a computable with 
respect to it sequence of positive integer numbers exists such that < h^2 < 
< ho < hi < . . . and 

a(/ii_i) -a(/i,_2) > -logr + i + 13 (11) 

for all i = 0, 1, . . .. The gadgets will be defined by mathematical induction 
on steps. The gadget Aq is defined by cutting of the interval [| — t, | + r) 
on 2/io equal parts and by stacking them. Let IIo be a gadget defined by 
cutting of intervals [0, \ — r) and (| + r, 1] in 2ho equal parts and stacking 
them. The purpose of this definition is to construct initial gadgets of height 
2ho with supports satisfying A(Ao) = 2r and A(IIo) = 1 — 2r. 
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The sequence of gadgets {A^}, s = 0,l,..., will define an approximation 
of the uniform Bernouli measure concentrated on the names ot their trajec- 
tories. The sequence of gadgets {lis}, s = 0, 1, . . ., will define a measure 
with sufficiently small entropy. The gadget IIs-i will be extended at each 
step of the construction by a half part of the gadget A^^i. After that, the 
independent cutting and stacking process will be applied to this extended 
gadget. This process eventually defines infinite trajectories of points from 
interval [0, 1). The sequence of gadgets {lis}, s = 0, 1, . . ., will be complete 
and will define the needed measure P. Lemmas [2] and [3] will ensure the 
transformation T and measure P to be ergo die. 

The purpose of the construction is to suggest conditions under which 
there exists a point in interval [0, 1) having an infinite trajectory with a 
name a satisfying ([7j), ([8]) and Q). To implement ([8]) we periodically extend 
initial fragments of a by names of trajectories of gadgets As_i (for suitable s) 
which have the maximal complexity. To bound the deficiency of randomness 
of initial fragment of length n by the value cr(n) we suggest with the help 
of condition flTTl) some relation between the height of the gadget A^ and the 
measure of the support of this gadget. We will use Proposition [S] to define 
an extension with sufficiently small deficiency of randomness. To implement 
condition (Q it is sufficient to extend names in long runs of the construction 
only in account of trajectories of gadgets {Ilg}, s = 0, 1, . . .. For any s 
only a portion < r of the support of such gadget belongs to element tti of 
the partition. Then by ergodic theorem the most part of (sufficiently long) 
trajectories of this gadget will visit tti according to this frequency, and the 
names of these trajectories will have the frequency of ones bounded by a 
small number 2r, that ensures the bound iQ. 

Construction. Let at step s — 1 (s > 0) gadgets Ag^i and fl^.i were 
defined. Cut of the gadget A^-i into two copies A' A" of equal width 
(i.e. we cut of each column into two subcolumns of equal width) and join 
Hs-i U A" in one gadget. Find a number Rg and do i?s-fold independent 
cutting and stacking of the gadget n^-i U A" and also of the gadget A' to 
obtain new gadgets Ilg and Ag of height 2hs such that the gadget ns_i U A 
is (1 — l/s)-well-distributed in the gadget 11^. The needed number Rg exists 
by Lemma [3] (Appendix 2). 

Properties of the construction. Define T = T{ns}. Since the sequence of 
the gadgets {11^} is complete (i.e. X(tls) 1 and w(Ils) as s ^ oo) the 
transformation T is defined for A- almost all u. The measure P is defined by 
f lTU]) . The measure P is stationary, since the transformation T preserves the 
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uniform measure A. Measure P is ergodic by Lemma [2] (Appendix 2), where 
Tg = lis, since the sequence of gadgets lis is complete. Besides, the gadget 
IIs-i U A", and the gadget Hs-i are (1 — - well-distributed in 11^ for 
any s. By construction 

A(A,) = 2-*+V and X{fli) = 1 - 2-^+V (12) 

for all i = 0,1, . . .. 

This construction is algorithmic effective, so the measure P is computable 
with respect to a. 

Let us prove that entropy H of the measure P do not exceed e. Since 
A(vri) = r and the transformation T preserves the measure A, by ergodic 
theorem in almost all points of interval [0, 1) a trajectory starts such that 
the limit of the frequency of visiting the element vri by this trajectory is equal 
r, when the length of initial fragment of such trajectory tends to infinity. 
Thus for any 6 > for all sufficiently large n the measure P of all sequences 
X of length n with portion of ones < 2r is > 1 — 5. Let us consider any such 
sequence x as an element a finite set consisting of all sequences of length 
n and containing no more than 2rn < ^ ones. Then we obtain a standard 
upper bound 

K(x) 1 , / f n W 21ogn , , , 

^^<-log 2rn + ^<-3rlogr (13) 

n n \ \2rn J J n 

for all sufficiently large n. By this inequality and by ([6]) we obtain upper 
bound H < — 3rlogr < e for entropy H of the measure P, where r is 
sufficiently small. 

Let us prove that an infinite sequence a exists such that the conclusion 
of Theorem [1] holds. We will define a by induction on steps s as the union 
of an increasing sequence of initial fragments 

a(0) C ... C a(A;) C ... (14) 

For all sufficiently large k the Kolmogorov complexity of initial fragment a{k) 
will be small if k is odd, and complexity of a{k) will be large, otherwise. 

For any uj g [0, 1) the frequency of visiting of tti by trajectory starting in oj is equal 
to (1/0 X]i=i Xi(^*'^)j where / is the length of this trajectory and xi(r) = 1 if r e tti, 
and Xi(^) = 0, otherwise. 
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Define a(0) be equal to Ilo-name of some trajectory of length > Hq such 
that dp{a{0)) < 2. This is possible to do by Proposition (Appendix 1). 
Define = s(0) = 0. 

Induction hypotheses. Suppose that A; > and a sequence a(0) C . . . C 
a{k — 1) is already defined, and for some step s{k — 1) of the construction the 
word a{k — l) is Ils(^k-i) ^ name of a trajectory of some point from the support 
of the gadget ns(jt_i). We suppose that l{a{k — 1)) > /^^(fc-i), and if k is odd 
then dp {a (k — l)) < (r{hs{k~2)) —4:. If /c is even then (ip(a(A; — 1)) < <7{hs{k~2)) 
and P^(^-i)(a(A; - 1)) > {l/8)P{a{k - 1)). 

Let us consider any odd k. Define a = a{k — 1). 

Let us consider a set of all intervals (from columns) of the gadget LI^.i 
with the following property: for any trajectory starting from this interval 
with n^^i-names extending a the frequency of visiting the element tti of the 
partition is < 2r. For the name 7 of any such trajectory an inequality 

^(7)^(7) < -Srlogr < e (15) 

(analogous to f|T3l) ) holds, where r is sufficiently small. As in the proof of 
the inequality if < e we obtain by ergodic theorem that for all sufficiently 
large s total length of all interval from this set is > (l/2)P(a). 

Let us consider an arbitrary column from the gadget lis. Divide all its 
intervals on two equal parts: upper part and lower part. We will consider 
only intervals from the lower part. Any trajectory starting from a point of 
an interval from this part has length > hs. Fix some s as above and define 
s{k) = s. Let Us{a) be all intervals from the lower part of the gadget 11^ 
such that trajectories starting from them and having 11^ - names extending 
a satisfy the inequality (1151) . Let Da be a set of all 11^ - names of all these 
trajectories. Inequality P'^{Da) = P''(a) > (l/4)P(a) holds for the total 
length P'^{Da) of all intervals from Us{a). 

Define D = Uxi^n^x- It is easy to prove that a set Ca ^ Da exists 
such that P{Ca) > (l/8)P(Z)j and P'{b) > (1/8)P(6) for all b e Ca. By 
Proposition E] (Appendix 1) an 6 e Ca exists such that dp{V) < dp{a) + 4 
when l{a) < j <l{h). Define a{k) = b. By induction hypotheses inequalities 
dp{a) < aihs^k-i)) - 4 and l{a) > hs(k-i) > hs{k-2) hold. Then dpiV) < 
cr{K(k-2)) < cr{l{a)) < a{j) for all l{a) <j< lib). 

Notice, that l{b) > h^i^k), since any trajectory defining b starts from an 
interval of the lower part of the gadget 11^, and the height of this gadget is 
> 2hs. The rest induction hypotheses are proved above. 
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The condition is true, since condition fllSI) holds for infinite number 
of initial fragments a{k) of the sequence a. 

Let k be even. Put b = a{k — 1). Let s = s{k — 1) + L Define s{k) = s. 

Let us consider an arbitrary column from the gadget A^.i. Divide all its 
intervals into two equal parts: upper part and lower part. Any trajectory 
starting from an interval of the lower part have the length > L/2, where 
L > 2hs-i is the height of the gadget A^-i. The uniform measure of all 
such intervals is equal to |A(As_i). Let us consider the names x^^"^ of initial 
fragments of length L/2 oi all these trajectories. By incompressibility prop- 
erty of Kolmogorov complexity ([2]) and by choice of L the uniform Bernoulli 
measure of all sequences of length L/2 satisfying 

K{x^l^) 2 
< 1 



is less than 2"^/^"-'^ < 1/4. Names of initial fragments (of length L/2) of the 
rest part of trajectories starting from intervals of lower part of the gadget 
As_i satisfy 

^(^^>l-^ (16) 

It is noted in Appendix 2 (Remark 1), for any step s of the construction the 
equality P"-^(x) = 2-'(^)A(A,_i) holds for the name x of any trajectory of the 
gadget As_i. We conclude from this equality that the uniform measure of all 
intervals from the lower part of the gadget A^.i, such that trajectories with 
names (more correctly, with initial fragments x^/"^ of such names) satisfying 
( IT6|) start from these intervals, is at least |A(As_i). 
By (dH) and 

A(fl._i) 2A(fl,_i) 1-2-+2 

Let us consider i^^-fold independent cutting and stacking of the gadget n^-iU 
A" in more details. At first, we cut of this gadget on Rs copies. When we 
stack the next copy on already defined part of the gadget the portion of all 
trajectories of any column from the previously constructed part, which go to 
a sub column from the gadget A", is equal to 

. '^^"^ . ^ ^. (18) 
A(n,_i) + A(A") 1 + 7 
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This is true, since by definition any column is covered by a set of subcolumns 
with the same distribution as the gadget n^-i U A" has. Total length of all 
intervals of the gadget IIs-i such that trajectories with names extending b 
start from these intervals is equal to P*^^(6). 

Consider the lower half of all subintervals generated by cutting and stack- 
ing of the gadget IIs.i in which trajectories with H^.i-names extending b 
start. The length of any such trajectory (in lis) is at least hg. By this 
reason some inductive hypothesis will be true. The measure of all remain- 
ing subintervals decreases twice. After that, we consider a subset of these 
subintervals, such that trajectories starting from subintervals of this subset 
go into subcolumns of the gadget A". The measure of remaining subintervals 
is multiplied by a factor 7/(1 +7). Further, consider subintervals from the 
remaining part generating trajectories whose names have in A" fragments 
satisfying f|T6l) . The measure of the remaining part can be at least 1/4 from 
the previously considered part. We obtain this bound from previous esti- 
mate of the portion of subintervals generating trajectories in the gadget A" 
of length > L/2 satisfying flTBl) . |§ Let Dj, be a set of all Hs-names of all 
trajectories starting from subintervals remaining after these selection opera- 
tions. Then 

The name of any such trajectory has initial fragment of type bx'x^^'^, where 
j.'j.L/2 jg name of a fragment of this trajectory corresponding to its path in 
the gadget As_i. The word x^^'^ has length L/2 and satisfies f|T6|) . The word 
x' is the name of a fragment of the trajectory which goes from lower interval 
to an interval generating trajectory with name x^^'^. We have l{bx'x^^'^) < 
2L = Al(x^^'^). By ([3]) and (fT6|) we obtain for these initial fragments of 
sufficiently large length 

K{bx'x^/^) K{x^/^) -2logl{bx') 11 
/(6x'x^/2) - 4/(a;^/2) - 4 ~ 

We have P^^^{b) > (1/8)P(6) by induction hypothesis. After that, taking 
into account that 7 < 1, we deduce from (fTOll 

P{D,) > P"\D,) > -^P{b). 
^ Remember, that L (> 2hs-i) is the height of gadgets IIs-i, As_i. 
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By Proposition O an c G -D^ exists such that 

dp{ci) <dp{b) + l -log ^< 
dp{b) + {cj{K_i) - a{h,_2) - 12) + 8 < a{hs-i) - 4 = - 4 

for all l{h) < j < /(c). Here we have dp{b) < cr{hs(^k_2)) < cr(/^s-2) by 
induction hypothesis. We also used inequality ffTTI) . Besides, by induction 
hypothesis we have l{b) > /i^-i. Therefore, 

dp{c^) < a{hs^,) < am) < ^(J) 

for l{b) < j < l{c). Define a{k) = c. It is easy to see that all induction 
hypotheses are true for a{k). 

An infinite sequence a is defined by a sequence of initial fragments (fT^ . 
We proved that dp{a^) < a{j) for all j > l{a{l)). 

By the construction there are infinitely many initial fragments of the 
sequence a satisfying fl20|) . The sequence hs, where s = 0, 1, . . ., is monotone 
increased. So, the condition ([8]) hold. A 

4 Non-robustness property of universal codes 

Let A and B be finite alphabets. By a code we mean a computable family 
of functions 0„ : A" ^ B*, where n = 1, 2, . . .. Suppose that B = {0, 1}. 
We will consider decodable codes. A computable family of decoding func- 
tions tpn '■ 4'n{A^) ~^ A"- such that a = ?/'„(0„(a)) for all n and for all 
a G v4" is associated with this code. A separating property of the code is 
required. An algorithm must exist decoding any sequence of concatenated 
codewords. Prefix codes satisfy to this requirement. Any two codewords 
0„(a) and 0n(«') are incomparable under prefix method of coding. For any 
code {0„} a compressing ratio P0„(q;") = («""))/(« log \A\) of input word 
a" G A"- is defined. We suppose for simplicity that A = {0,1}. 

In [T71 [18] codes universal in the mean for some classes of sources were 
considered, in [H [2] a code universal almost everywhere for the class of all 
stationary ergodic sources was defined. We consider codes universal almost 
everywhere. 

^ A function is computable by both arguments n and a. 
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A code {0„} is called universal with respect to a class of stationary ergodic 
sources if for any computable stationary ergodic measure P from this class 

limp^„{u;n = H (21) 

holds P-almost every infinite sequence u = ix)\i02 • • •, where H is the entropy 
of the measure P. There exist several types of Lempel - Ziv universal coding 
scheme PQ [2] • Let us recall two of them. 

A coding algorithm is fed with a word cui . . . cuat of length A^. By the first 
variant of the algorithm a sequence of letters L0\^i02 . . .iOni's> read beginning at 
the left and is divided on subblocks as follows: a pointer on fc-th subblock is 
inserted after a;j(fc) if subblock co'j(A:_i)+iti;j(fc_i)+2 • • • '^t{k)-\ was already seen 
between previous pointers and subblock ci;j(fc_i)+iu;j(fc_i)+2 • • • '^^(A:) was not 
seen. To encode new subblock it is sufficient to memorize coordinate of the 
beginning of the sequence u;j(fc_i)_|_ic<jj(fc_i)+2 • • • i^j(fc)-i, its length, and new 
letter a;i(fc). 

The same idea is used in the second variant of the algorithm but a sub- 
block co'i(fc-i)+iLt;j(,fc_i)_|_2 • • • is deemed to have appeared if it occurs at 
all - not necessary between pointers. 

The following proposition on non-robustness of universal codes is an ana- 
log of Theorem [H 

Proposition 2 For any nonnegative, nondecreasing, and unbounded func- 
tion cr(n) and for any real number 0<e<l/4a computable with respect 
to a stationary ergodic measure P with entropy < H < e exists such that 
for each universal (for class of all stationary ergodic sources) code {(pn} on 
infinite binary sequence a exists such that dp{a"') < (y{n) for almost all n 
and 

limsupp^Ja") > -■ (22) 
liminfp^„(a")<e. (23) 

Proof. For any n a decoding algorithm tpn of the code {</>„} is defined by 
log 77, + 0(1) bits. Then we have 

i^(a") </(</)n(a)) + 0(logn). (24) 

Inequality (|22|) follows from the inequality ([8]) of Theorem [1] The proof of 
the inequality fl23|) is analogous to the proof of the inequality ([9]) of Theo- 
rem [H We must only replace condition f[T^ from the proof of Theorem [T] on 
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l{(f)n{uj^)) / n < e and take into account property (!?!]) of asymptotic optimality 
of the code A 

Let {(Pn} be a code. Under block realization of the code any sequence of 
letters cu" = cji . . . u;„ is divided in consecutive blocks u = ui . . .Uk, where 
n = {k — 1)N + q, < q < N and uJi = ■ ■ ■ ^iN, i = 1,2, ... k — 1, 

is a block of length N, and = ^{k-i)N ■ ■ ■ ^{k-i)N+q is the last incomplete 
block. Any block cDj is encoded by a binary word (j)pf{ui). In asymptotic 
estimates (when n oo) method of coding of this last block ujk is unessential 
(we fix some of these methods). We write 0Ar(ci;"') = 07v('^i) • • • (pN{<^k) and 

It is proved in [2] (Theorem 4) that for any stationary ergodic measure 
P with entropy H a property of asymptotic optimality holds for block real- 
ization of Lempel-Ziv code {(Pn} with blocks of length N. Relation 

lim lim sup (cu") =H (25) 

N—>co n^oo 

holds for P-almost all uj. We can prove that equality fl2S]) holds also for any 
sequence uj random in sense of Martin-Lof with respect to a measure P (i.e. 
when dp{u^) = 0(1) as n — oo). 

The following analogue of Theorem [1] holds for block realization of codes 
with block length N and for codes using sliding window of length (when 
a new letter of codeword depends only from preceding letters of input 
word) . 

Proposition 3 For any nonnegative, nondecreasing, and unbounded func- 
tion a{n) and for any real number < e < 1/4 a computable with respect 
to a stationary ergodic measure P with entropy < H < e exists such that 
for each universal (for class of all stationary ergodic sources) code {4>n} or 
for each universal code with sliding window of length N an infinite binary 
sequence a exists such that (ip(a") < cr(n) for almost all n and for any N 

limsupp<^^(a") > ^, (26) 

n— >oo 4 

and for all sufficiently large N 

liminfp(0jv(a")) < e. (27) 

The proof of this proposition is a small comlication of the proof of Proposi- 
tion |2 



15 



Notice, that the property is also hold for adaptive coding scheme, 
i.e. when coding algorithm depends on preceding blocks. 

Using Theorem [1] it can be proved that non-robustness property holds for 
other well-known universal codes. For example, in [19] a universal fore- 
casting measure p{ui...Un) and a code ipn such that Z('0„(ci;i . . . < 

— logp(ci;i . . .Un) + 1 were defined. This measure is defined as a mixture 

oo 

p{y) = J2 ^kPk{y) of measures pk universal for Markov sources of order k 

constructed in the theory of universal coding [20]. Here is some opti- 
mal probability distribution on positive integer numbers (it can be defined 
Afc = ck~^ log~^ k, where c is a constant) and 0(/c) is the corresponding code- 
word for a positive integer number k: l{(j){k)) = logfc + 0(loglog fc). In [2T] 
an universal code was constructed ipi^u) = (f){l{u))ipi(^u){u), where u & B*. 
The universality conditions for the measure p and for the code ip is the fol- 
lowing: for any stationary measure p with entropy H(p) for /i-almost all 
uj & Q the mean error of the forecast by measure p tends to zero 

lim 2^ log < T = lim - log = 0, (28) 

and lim„^oo lijpiui'^)) /n = lim„^oo — log p{uj"-) / n = H{p). It is easy to derive 
from the definition of the deficiency of randomness that the condition 0281) 
is "robust under violation of randomness", more correctly, it holds for any 
computable stationary measure p and for any infinite sequence u such that 
d^{(jj^) = o{n) as n ^ oo. But the corresponding universal code ip is non- 
robust for the class of all stationary ergodic sources. Since a decoding algo- 
rithm exists for the code ip it holds K{u!i . . . Un) < /('^/'(co'i • • • cUn)) + 0(1) < 

— logp(ci;i . . .Un) + O(logn). Then by Proposition [2] there exists an a E fl, 
such that the conclusion of this proposition holds, in particular, the condition 
(122|) holds. The property (!23|) can be obtained as in the proof of Proposition [2] 
by universality of the code. 

The property of asymptotic optimality can be robust for more narrow 
classes of stationary ergodic sources such that as i.i.d sequences of random 
variables or stationary Markov chains. 

Proposition 4 Let P be an arbitrary computable probability measure repre- 
senting a stationary ergodic Markov chain of fixed order (in particular, i.i.d 



We give some simplification of the results of [191 IS 
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sequence of random variables), H is its entropy, {4>n} is a variant of Lempel- 
Ziv compressing algorithm. Then for any infinite sequence uo ifdp{uj") = o{n) 
then equality l[21\) holds, and for block realization of this compressing scheme 
equality ^2^) holds. 

The proof is based on constructive feature of the proof of results from [2]. 
The Birghoff 's ergodic theorem is also used in this proof that is in the case 
of Markov sources is a variant of the law of large numbers. This law holds 
for individual sequence uj when dp{u^) = o{n) as n ^ oo. 

5 Appendix 1 

Bounded increase of the deficiency of randomness. In the proof of Theo- 
rem [1] a proposition on a bounded increase of the deficiency of randomness was 
used. Let P be a measure, P{x) ^ and a set A consists of words y sucli tliat 
X C y. Recall, that P{A) = P{U{ry : y e A}) for any A C {0,1}*. Define 
P{A\x) = P{A)/P{x). 

Proposition 5 Let P he a measure, x he a word, P{x) ^ and a set A consists 
of words y such that x C y and P{A) > 0. Then for any < fj, < 1 a subset 
A' (^A exists such that P{A') > nP{A) and 

dpiy"") < dp{x) - log(l logP(i|x) 

for all y & A' and l{x) <n< l{y). 

Proof. We will use in the proof a notion of supermartingale [12j. A function M is 
called P-super martingale if it is defined on {0, 1}* and satisfies conditions: 
M(A) < 1; 

M{x) > M(xO)P(0|x) + M(xl)P(l|x) for all x, 
where P{i'\x) = P{xv) / P{x) for z/ = 0, 1 (we put here 0/0 = * oo = 0). 

A supermartingale M is lower semicomputable if the set {{r,x) : r < M(x)}, 
where r is a rational number, is a range of some computable function. We will 
consider only nonnegative supermartingales. 

Let us prove that the deficiency of randomness is bounded by a logarithm of 
some lower semicomputable supermartingale. 

Lemma 1 Let P he a computable probability measure. Then there exists a lower 
semicomputable P -supermartingale M such that dp{x) < logM(x) for all x. 
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Proof. Let some optimal function tp satisfying ([¥]) defines the monotone complexity 
Km{x). Define 

g(a) =i?i/2(U{rp:aCV'(p)}), (29) 

where Bii2{Ta) = 2~^*^"'* is the uniform Bernoulli measure on the set of all binary 
sequences. It is easy to verify that (5(A) < 1 and Q{a) > Q{aO) + Q{al) for all 
words Q. Then the function M{a) = Q{a) / P{a) is a P-super martingale. 

Since for any a the shortest p such that a C ip{p) is an element of the set 
from ([29]), we have inequality Q{a) > 2-^™("), and so, dp{a) < log M(a). A 

Let dp{x) < logM(x), where M is lower semicomputable P - supermartingal. 
Let us define a set 

= |y G A : 3j (^/(x) < j < l{y) and M(y^) > ^-—l-^-^M{x] 

A set of words B is called prefix free if for any two distinct words x,y £ B 
conditions x %y and y % x hold. 

By definition of supermartingale for any prefix free set B such that x C y for 
all y G -B inequality 

M{x) > ^ M{y)P{y\x) (30) 
holds. For any y € let y^ be the initial fragment of y of maximal length such 

M{x) ^ {l-fi)P{A\x) • 



that ^jjf^ > (i^,.\p(A \ r) - The set {y^ : y € Ai} is prefix free. Then by ([SOD we 



have 

^ - y P(yP|x) > — P(ii|x). 

(1 - f.)P{A\x) ^ix, (1 - M)^'(^k) 

From this we obtain P{Ai\x) < (1 — fi)P{A\x). Define 

A' = A — {y A : z y for some z G Ai}. 

Then P{A'\x) > nP{A\x). For any y E j4' we have 

1 



M(y^) < M{x) 



(l-^)P(i|x) 



for all /(x) < J < (y). The result of the proposition follows from inequality 
dp{x) < logM(x). A 
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6 Appendix 2 



Method of cutting and stacking. An arbitrary measurable mapping of the a 
probabihty space into itself is called a transformation or a process. A transforma- 
tion T preserves a measure P if P{T-^{A)) = T{A) for all measurable subsets A 
of the space. A subset A is called invariant with respect to T if T~^A = A. A. 
transformation T is called ergodic if each invariant with respect to T subset A has 
measure or 1. 

The simplest example of such transformation of the space A°° of all infinite 
sequences, where A = {0, 1, . . . , /c — 1} is some finite alphabet, is the (left) shift T 
defined by iToj)i = uJi+i for all i = 1,2, . . .. If the shift T preserves the measure 
P then this measure is called stationary, i.e. 

P{uj : uji = xi, . . . , Wj+fc-i = Xk} = P{uj : uji = xi, . . . ,uJk = Xk} 

for all positive integer numbers i,k > 1 and all xi, . . . ,xi^ equal or 1. 

Recall some notions of symbolic dynamics. We us consider the uniform measure 
A on the unit interval [0, 1) and a transformation T of this interval. A partition 
is a sequence pairwise disjoint subsets vr = (vri, . . . , tt/c) of the interval [0, 1) whose 
union is equal to this interval. A transformation T defines a measure on the set of 
all finite and infinite words of the alphabet A = {0, 1, . . . ,k — 1} as follows 

P{aia2 ...an) = X{uj:uje [0, 1), T^c^) E vr^^, i = 1, 2, . . . , n}, (31) 

where aia2 . . . a„ is a sequence of letters from A. The measure P can be extended 
on all Borel subsets of A°° by a natural fashion [12]. The measure P defined 
by (I3ip is stationary and ergodic with respect to the left shift if and only if the 
transformation T has the same properties. 

We use a cutting and stacking method of constructing of ergodic processes 
[22\ I23j . Recall the main notions and properties of this method. A column is a 
sequence E = {Li, . . . ,Lh) of pairwise disjoint subintervals of the unit interval of 
equal width; Li is the base, is the top of the column, E = U^^j^Lj is the support 
of the column, w{E) = A(Li) is the width of the column, h is the height of the 
column, X{E) = A(U^^^Lj) is the measure of the column. Any column defines an 
algorithmically effective transformation T which linearly transforms Lj to Lj^i for 
all J = 1, . . . , /i — 1. This transformation T is not defined outside all intervals of the 
column and at all points of the top interval of this column. Denote T^uj = to, 
T^^^u) = T{T^Lo). For any 1 < j < /i an arbitrary point lo G Lj generates a finite 
trajectory uj,Tuj, . . . ,T^~^uj. A partition vr = (vri, . . . ,7rfc) is compatible with a 
column E if for each j there exists an i such that Lj C vTj. This number i is 
called the name of the interval Lj, and the corresponding sequence of names of 
all intervals of the column is called the name of the column E. For any point 
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LO G Lj, where 1 < j < /i, by i?-namc of the trajectory a;, To;, . . . , T^~^lo we mean 
a sequence of names of intervals Lj, . . . ,Lh from the column E. The length of this 
sequence is h — j + 1. 

A gadget is a finite collection of disjoint columns. The width of the gadget 
w{T) is the sum of the widths of its columns. A union of gadgets Tj with disjoint 
supports is the gadget T = UTj whose columns are the columns of all the Tj. 
The support of the gadget T is the union T of the supports of all its columns. A 
transformation r(T) is associated with a gadget T if it is the union of transfor- 
mations defined on all columns of T. With any gadget T the corresponding set of 
finite trajectories generated by points of its columns is associated. By T-name of 
a trajectory we mean its E'-name, where E is that column of T to which this tra- 
jectory corresponds. A gadget T extends a column A if the support of T extends 
the support of A, the transformation T(T) extends the transformation T(A) and 
the partition corresponding to T extends the partition corresponding to A. 

The cutting and stacking operations that are common used will now be defined. 
The distribution of a gadget T with columns Ei, . . . ,Enis & vector of probabilities 

( w{E{) w{En) \ 
U(T)"-" w{T))- 

A gadget T is a copy of a gadget A if they have the same distribution and the 
corresponding columns have the same partition names. A gadget T can be cut 
into M copies of itself Tj,i = 1, . . . ,M, according to a given probability vector 
(71, . . . ,7„) by cutting each column Ei = {Lij ■ 1 < j < h{Ei)) (and its intervals) 
into disjoint subcolumns Ei^m = {Li^j^m ^ 1 ^ J < h{Ei)) such that w{Ei^„i) = 
w{Lij^m) = lmw{Lij). The gadget = {-Ei,m : 1 < i < L} is called the copy 
of the gadget T of width 7^. The action of the gadget transformation T is not 
affected by the copying operation. 

Another operation is the stacking gadgets onto gadgets. At first we consider 
the stacking of columns onto columns and the stacking of gadgets onto columns. 

Let El = {Li^j : 1 < i < h{Ei)) and E2 = {L2J : 1 < j < h{E2)) be two 
columns of equal width whose supports are disjoint. The new column Ei * E2 = 
{Lj : I < j < h{Ei) + h{E2)) is defined as Lj = Lij for all 1 < j < h{Ei) and 
Lj = L2j-h(Ei)+i for all h{Ei) < j < h{Ei) + h{E2). Let a gadget T and a column 
E have the same width, and their supports are disjoint. A new gadget * T is 
defined as follows. Cut E into subcolumns Ei according to the distribution of the 
gadget T such that w{Ei) = w{Ui), where Ui is the i-th column of the gadget T. 
Stack Ui on the top of E^ to get the new column Ei * Ui. A new gadget consists 
of the columns {E^ *Ui). 

Let T and A be two gadgets of the same width and with disjoint supports. A 
gadget T * A is defined as follows. Let the columns of T are (Ei). Cut A into 
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copies Aj such that w{Ai) = w{Ei) for all i. After that, for each i stack the gadget 
Aj onto column Ei, i.e. we consider a gadget Ei* Ai. The new gadget is the union 
of gadgets Ei * Aj for all i. The number of columns of the gadget T * A is the 
product of the number of columns of T on the number of columns of A. 

The M-fold independent cutting and stacking of a single gadget T is defined 
by cutting T into M copies Ti, i = 1,...,M, of equal width and successively 
independently cutting and stacking them to obtain T*^*^) = Ti * . . . * T^f. 

Remark 1. Several examples of stationary measures constructed using cutting 
and stacking method are given in [22\ |23] . We use in Section [3] a construction of 
a sequence of gadgets defining the uniform Bernoulli distribution on trajectories 
generated by them. This sequence is constructed using the following scheme. Let a 
partition vr = (ttq, tti) be given. Let also A be a gadget such that its columns have 
the same width and are compatible with the partition vr. Let A(An7ro) = A(An7ri). 
Suppose that for some M a gadget A' is constructed from the gadget A by means 
of M-fold independent cutting and stacking and P be a measure on trajectories 
of the gadget A' defined by (f3T]l . Then by the method of cutting and stacking 
P{x) = 2^'(^^(A) for the trajectory x of any point from the support of A'. 

A sequence of gadgets {Tm} is complete if 

• lim w{Tm) = 0; 

m— >oo 

• lim A(tm) = 1; 

m— ►oo 

• Tm+1 extends for all m. 

Any complete sequence of gadgets {T^} determines a transformation T = T{Ts} 
which is defined on interval [0, 1) almost surely. 

By definition T preserves the measure A. In [22j and [23] the conditions suf- 
ficient a process T to be ergodic were suggested. Let a gadget T is constructed 
by cutting and stacking from a gadget A. Let £^ be a column from T and D be 
a column from A. Then E f] D is defined as the union of subcolumns from D of 
width w^E) which were used for construction of E. 

Let < e < 1. A gadget A is (1 — e)-well-distributed in T if 

Y: E \KEnD)-X{E)X{D)\<e. (32) 

DeA-BgT 

We will use the following two lemmas. 

Lemma 2 (fMl, Corollary 1), fj2^. Theorem A.l). Let {T„} he a complete 
sequence of gadgets and for each n the gadget {T„} is (1 — en) -well- distributed in 
{T„+i}, where e„ — > 0. Then {T„} defines the ergodic process. 



21 



Lemma 3 ( 123^ . Lemma 2.2). For any e > and any gadget T there is an M 
such that for each m > M the gadget T is {1 — e) -well-distributed in the gadget 
constructed from T by m-fold independent cutting and stacking. 
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